Detecting the First Hydration Shell Structure around Biomolecules at Interfaces

Understanding the role of water in biological processes remains a central challenge in the life sciences. Water structures in hydration shells of biomolecules are difficult to study in situ due to overwhelming background from aqueous environments. Biological interfaces introduce additional complexity because biomolecular hydration differs at interfaces compared to bulk solution. Here, we perform experimental and computational studies of chiral sum frequency generation (chiral SFG) spectroscopy to probe chirality transfer from a protein to the surrounding water molecules. This work reveals that chiral SFG probes the first hydration shell around the protein almost exclusively. We explain the selectivity to the first hydration shell in terms of the asymmetry induced by the protein structure and specific protein–water hydrogen-bonding interactions. This work establishes chiral SFG as a powerful technique for studying hydration shell structures around biomolecules at interfaces, presenting new possibilities to address grand research challenges in biology, including the molecular origins of life.


Description of the SFG spectrometer
A detailed description and characterization of the custom SFG spectrometer utilized in the studies can be found in Ma, et al. 1 Briefly, a 5 kHz ultrafast regenerative amplifier (Spitfire, Spectra-Physics) is pumped by two Nd:YLF lasers (Empower, Spectra-Physics) and seeded with a Ti:Sapphire laser (MaiTai, Spectra-Physics). The amplifier creates 120 femtosecond pulses with a total power of 6 W. A 50/50 beamsplitter is used to divide the amplifier output along two paths. The first path (3 W) is passed through a custom pulse-shaper yielding narrow-band pulses centered at 800 nm. The second path (3 W) is used to pump a tunable optical parametric amplifier (TOPAS, Spectra-Physics/Light Conversion) to generate broad-band infrared pulses. The resultant visible and infrared beams are spatially and temporally overlapped at the sample surface. The SFG signal reflected from the sample interface is dispersed with a monochromator, and the spectrum is collected with a CCD. The polarization of the SFG, visible, and infrared beams for all experiments reported in the main text was p-polarized, s-polarized, and p-polarized, respectively. 2

S3
Experimental spectra: Sewn spectra Figure S1. The two spectral windows collected for each spectrum presented in Figure 2 in the main text. The two individual spectra were overlaid and normalized to the N-H stretching peak of LK7b at ~3270 cm -1 and stitched together at the point indicated with hashmarks. The noisy regions that appear at the high frequency ends of the spectra are due to normalization of the broadband spectra by clean quartz. a.u.: arbitrary units.

Description of Fitting Experimental Spectra.
In the spectral region 2900 -3800 cm -1 , C-H and N-H stretching of the LK7b protein, O-H stretching of water, and possibly additional signals due to Fermi resonance of water can contribute to the vibrational spectra (for more information on vibrational analyses of proteins, see Barth and Zscherp). 3 The experimental spectra in Figure 2b were fit with Lorentzians according to: (S1) where ! is the resonant frequency of the q th vibrational mode, ! is the amplitude, Γ ! is the halfwidth half-maximum of the q th vibrational mode, and "# is the frequency of the incident infrared. Therefore, each vibrational peak contributes three parameters ( ! , ! , Γ ! ). The minimum number of vibrational peaks needed to fit the experimental data ( Figure 2b) is determined by residual analyses as shown in Figure S2. Eleven peaks are used to fit the spectra, and they are tentatively assigned as shown in Table S1. Figure S2. Fitting and residual analyses of the fitting to the experimental spectra of (a) LK7b-H2O and (b) LK7b-H2 18 O presented in Figure 2b in the main text with (from top to bottom) 11, 10, 9, and 8 vibrational peaks. Fitting parameters for the 11-peak fitting (top row) are reported in Table S1. Individual vibrational peaks of each fitting are shown as gray lines (these have been scaled to fit within the viewing window). Residuals between the experimental data and the fitting are included above each spectrum. The residual analyses suggest that 11 component vibrational resonances are needed to model the experimental spectra in Figure 2b of the main text.
40.7 ± 0* 94.  Table S1. The fitting parameters (see equation S1, above) used for the 11-peak fitting shown in Figure S2. Water contributions are suggested based on H2 18 O substitution.

MD equilibration details
See the Methods section in the main text for more information about the MD simulations. The equilibration was initiated with an energy minimization of the solvent. This was followed by a 500 ps NVT equilibration of the solvent with a restrained protein (force constant of 500 kcal mol -1 Å -2 ). Then the positions of the hydrogen atoms of the protein were minimized. Subsequently, the system was minimized three times, gradually reducing the restraints on the protein heavy atoms (100, then 50, then 10 kcal mol -1 Å -2 ), followed by energy minimization of the entire system. This minimization was followed by simulated annealing over 360 ps from 0 K to 298 K under NVT conditions, followed by 6 ns of NVT equilibration at 298 K. No NPT equilibration was done because of the vacuum-water interface.

Selection methods for water subsets
The subsets in Figure 4 were selected using a combination of MDAnalysis selection logic, 4 in-house code, and the Voronoi tessellation library freud, 5 which uses the voro++ engine 6 to efficiently generate Voronoi diagrams. All selections were restricted to water molecules in the first hydration shell. All hydrogen bonds were identified if the donor-acceptor heavy atom distance was less than 3.5 Å and the donor-hydrogen-acceptor angle was greater than 135°. This cutoff distance is larger than typically used for defining hydrogen bonds because we wanted to ensure that the non-hydrogen-bonded subsets were truly free from hydrogen-bonding interactions with the protein. Periodic boundary conditions were taken into account for all water selections, including the Voronoi tessellations and hydrogen bonding analyses.

Figure 4a (entire first hydration shell)
The freud library was used to generate a Voronoi diagram with points corresponding to all atoms in the system. The neighbor list feature was then used to identify water molecules with at least one atom's Voronoi cell bordering the Voronoi cell of at least one protein atom. All atoms in the water molecules were considered as possible neighbors to the protein.

Figure 4b (backbone)
Backbone-associated water molecules were selected using the following MDAnalysis selection logic: byres (resname HOH and (around 4

.5 ((not resname HOH) and (name N H CA HA C O HN1 HN2))) and (not around 3.0 ((resname LEU) and not name N H CA HA C O HN1 HN2) or ((resname LYS) and not name N H CA HA C O HN1 HN2)))
This selection was fairly liberal in terms of backbone-associated water molecules but ensured that no water molecules closely associating with the sidechains were selected.

.5 ((resname LYS) and not name N H CA HA C O HN1 HN2) or ((resname LEU) and not name N H CA HA C O HN1 HN2)) and (not around 3.0 (not resname HOH) and (name N H CA HA C O HN1 HN2)))
This selection was fairly liberal in terms of sidechain-associated water molecules but ensured that no water molecules closely associating with the backbone were selected.

Figure 4d (backbone -not hydrogen bonded to protein)
First, molecules were preselected using the backbone MDAnalysis selection logic. Then, all water molecules hydrogen bonded to the C=O, NH, or NH3 + groups on the protein were identified and excluded, and the remaining water molecules not hydrogen bonded to the protein were considered in this selection. Both acceptor and donor scenarios were considered for hydrogen bonds with NH, although water donor configurations that met our hydrogen bonding definition were rare. The terminal NH2 groups were also considered as NH groups.

S9
First, molecules were preselected with the following MDAnalysis selection logic to reduce the number of water molecules needing to be processed by our hydrogen bond identification code (to improve performance with no effect on the result): byres ((resname HOH) and (around 3.5 (not resname HOH NA CL Na Cl))). Then, water molecules donating a hydrogen bond to a C=O group were identified.

Figure 4f: (backbone -hydrogen bonded to NH)
The same procedure was followed as above with C=O groups, except that water molecules accepting hydrogen bonds from NH groups were considered. The water molecule donor scenario was not considered after it was found that these configurations are rare and produce a negligible SFG signal.

Figure 4g: (backbone, hydrogen bonded to C=O with very short hydrogen bonds)
First, water molecules hydrogen bonded to C=O were identified. Then, the following MDAnalysis selection was used to identify water molecules forming very short hydrogen bonds to the C=O groups: byres ((resname HOH) and (around 1.6 (not resname HOH NA CL Na Cl))).No water molecules were found to form such short hydrogen bonds to any other functional group on the protein.

Figure 4h (sidechains -not hydrogen bonded to protein)
The same procedure was followed as for the backbone-associated not hydrogen bond water molecules, except the sidechain selection logic was used.

Figure 4i (sidechains -hydrogen bonded to lysine NH3 + )
The same procedure was followed as for the C=O groups and NH groups, except water molecules accepting hydrogen bonds from NH3 + groups were identified.

Construction of the Hamiltonian for calculation of the SFG spectrum
The exciton Hamiltonian for the SFG spectrum calculation was constructed as illustrated in Figure  S4. The main idea is to treat only the selected water molecules in the exciton Hamiltonian and to regard those water molecules, the other water molecules, and the protein as point charges influencing the local electric field, which in turn leads to transition dipoles and polarizabilities for each selected O-H group. All water selections began by selection of the first hydration shell (see Methods). The number of water molecules in the selection could fluctuate slightly between frames without negatively impacting the calculation because the calculation used the inhomogeneous limit approximation, where the signal is conceived as a simple average of individual frames. Figure S4. A schematic of our adaptation of Skinner's electric field mapping approach to calculate SFG spectra arising from subsets of the water molecules in the system. Here, a is the transition polarizability tensor, µ is the transition dipole vector, U are the eigenvectors of the Hamiltonian, and l are the eigenvalues. t is the vibrational lifetime (1.3 ps here). Figure S5. Water dipole moment vector directions at various depths (in Å) from the vacuum-water interface. The LK7b protein is shown in yellow. The blue lines are a guide for the eye. This figure shows that the lack of reflection plane perpendicular to the vacuum-water interface seen in Figure 6 in the main text extends to the bottom of the first hydration shell (~ -8 Å). The water dipole pattern becomes completely symmetric only at -12 to -14 Å, but the greatest asymmetry is between 0 and -8 Å. Both unit vector and vector with magnitude representations are shown. Figure S6. Unnormalized relative magnitudes of the dipole moment vectors at various locations around LK7b. The ordering of water dipoles past the second hydration shell is minimal relative to the ordering near the protein (see Figure 6 in the main text for comparison).